library(tidyverse)
library(readxl)
path <- "900-999/928/928 Summarize Status Hours.xlsx"
input <- read_excel(path, range = "A2:A27")
test <- read_excel(path, range = "C2:D7")
result <- input %>%
mutate(
Status = str_extract(Data, "Completed|In-Progress|On-Hold|Pending"),
Hours = str_extract(Data, "\\d{1,3}(?:\\.\\d+)?(?=\\D*$)") |> as.numeric()
) %>%
summarise(`Total Hours` = sum(Hours), .by = Status) %>%
arrange(desc(`Total Hours`)) %>%
bind_rows(
summarise(., Status = "Total", `Total Hours` = sum(`Total Hours`))
)
all.equal(result, test)
# [1] TRUEExcel BI - Excel Challenge 928
excel-challenges
excel-formulas
🔰 Extract status labels and trailing hour values from text, then summarize hours by status with a total row.

Challenge Description
🔰 Extract the status and the time given in hours from each text row, summarize the total hours by status, and append a final total row. The supplied implementations solve this by extracting the status label and the trailing numeric hours value from each string.
Solutions
- Logic: Extract one known status label and the final numeric value from each string, aggregate hours by status, then append a grand total.
- Strengths: The solution is compact and does not need brittle delimiter-based parsing.
- Areas for Improvement: The prompt mentions several suffix variants for hours, but the implemented logic is best understood as extracting the trailing numeric hours value.
- Gem: Two stable regex anchors are enough to turn semi-structured prose into a summary table.
import pandas as pd
path = "900-999/928/928 Summarize Status Hours.xlsx"
input = pd.read_excel(path, usecols="A", skiprows=1, nrows=26)
test = pd.read_excel(path, usecols="C:D", skiprows=1, nrows=5)
result = (
input
.assign(
Status=lambda d: d["Data"].str.extract(r'(Completed|In-Progress|On-Hold|Pending)'),
Hours=lambda d: d["Data"].str.extract(r'(\d{1,3}(?:\.\d+)?)(?=\D*$)').astype(float)
)
.groupby("Status", as_index=False)["Hours"]
.sum()
)
result = pd.concat(
[result, pd.DataFrame({"Status": ["Total"], "Hours": [result["Hours"].sum()]})],
ignore_index=True
)
result = result.rename(columns={"Hours": "Total Hours"})
print(result.equals(test))
# TrueThe Python version mirrors the same idea with str.extract(): one regex for the status and one regex for the final numeric chunk. Once those two fields are isolated, the rest is just a grouped sum plus a one-row total.
Difficulty Level
Easy / Medium
The aggregation is simple. The only real challenge is spotting the right text anchors to extract reliable fields from messy strings.